I will be trying to find accelerating/decelerating dealer usage.
library(tidyverse)
library(ggplot2)
library(lubridate)
library(plotly)
data <- read.csv("20230921 appraisal activity.csv")
head(data)
## create_date vin odometer dealer_id
## 1 2023-09-21 22:26:21 5FNRL5H97GB107752 46663.00000000 31233
## 2 2023-09-21 22:26:16 KM8J3CAL4MU408744 35644.00000000 1824
## 3 2023-09-21 22:26:07 1C4RDJDG3MC642161 60000.00000000 4562
## 4 2023-09-21 22:26:06 JTMRFREV5FD135295 130847.00000000 30295
## 5 2023-09-21 22:26:04 JM1NC25F570132039 34234.00000000 153
## 6 2023-09-21 22:26:01 5FNYF6H50LB051105 18784.00000000 218
## dealer_name dealer_contact_province dealer_contact_postal
## 1 Jim Falk Lexus of Beverly Hills CA 90212
## 2 UNIQUE AUTO SALES, LLC GA 31004
## 3 Tesla Motors BC V7T 2Z3
## 4 Woodhouse Auto Exchange NE 68118
## 5 John Wolfe TX 76180
## 6 Galves Market Data NJ 07608
## country source_type ph_year ph_manuf ph_model ph_body
## 1 1 6 2016 HONDA ODYSSEY TOURING 4 DOOR WAGON 7P
## 2 1 1 2021 HYUNDAI TUCSON SPORT 4 DOOR SUV
## 3 2 1 2021 DODGE DURANGO GT 4 DOOR SUV
## 4 1 6 2015 TOYOTA RAV4 XLE 4 DOOR SUV
## 5 1 1 2007 MAZDA MIATA GRAND TOURING CONVERTIBLE
## 6 1 1 2020 HONDA PILOT EX-L AWD 4 DOOR SUV
## ph_engtype trade market source
## 1 3.5L V6 19875 21225 sirius appraisal
## 2 2.4L 4 CYL 19550 20950 sirius appraisal
## 3 3.6L V6 35150 37225 sirius appraisal
## 4 2.5L 4 CYL 8050 9250 Dealer Website
## 5 2.0L 4 CYL 9400 10450 sirius appraisal
## 6 3.5L V6 30475 32025 sirius appraisal
Let’s take a deeper look into the data and see which variables will be useful in helping us see which dealers are increasing or decreasing usage of the tool.
colnames(data)
## [1] "create_date" "vin"
## [3] "odometer" "dealer_id"
## [5] "dealer_name" "dealer_contact_province"
## [7] "dealer_contact_postal" "country"
## [9] "source_type" "ph_year"
## [11] "ph_manuf" "ph_model"
## [13] "ph_body" "ph_engtype"
## [15] "trade" "market"
## [17] "source"
which(rowSums(is.na(data)) > 0)
## [1] 2196912
create_date: useful to gather a time period of data
input.
dealer_name: so we know which dealer is which.
dealer_id: same use
This should be enough to find out which dealers are accelerating and decelerating usage. We also notice that a single row, 2196912 is NA. we will remove this row.
print(data[2196912,])
## create_date vin odometer dealer_id dealer_name dealer_contact_province
## 2196912 2023-04- NA
## dealer_contact_postal country source_type ph_year ph_manuf ph_model
## 2196912 NA NA NA
## ph_body ph_engtype trade market source
## 2196912 NA NA
data <- na.omit(data)
We have removed the missing row.
Let’s select our desired cols, and convert to proper type.
dealer <- data |>
select(create_date, dealer_id, dealer_name) |> # select wanted cols
mutate(create_date = as.Date(create_date)) # convert to date format
Great, now we can parse each date to weekly intervals. By R defaults, weeks start on Monday.
week_frame <- dealer |>
mutate(week = week(create_date))
Let’s take a look at what we have so far.
head(week_frame)
## create_date dealer_id dealer_name week
## 1 2023-09-21 31233 Jim Falk Lexus of Beverly Hills 38
## 2 2023-09-21 1824 UNIQUE AUTO SALES, LLC 38
## 3 2023-09-21 4562 Tesla Motors 38
## 4 2023-09-21 30295 Woodhouse Auto Exchange 38
## 5 2023-09-21 153 John Wolfe 38
## 6 2023-09-21 218 Galves Market Data 38
Now that everything is nice and formatted, we can group by week and dealer to give a count of how many times each dealer used the tool that week.
dealer_grouped <- week_frame |>
group_by(week, dealer_id, dealer_name) |>
summarise(weekly_count = n())
head(dealer_grouped)
## # A tibble: 6 × 4
## # Groups: week, dealer_id [6]
## week dealer_id dealer_name weekly_count
## <dbl> <int> <chr> <int>
## 1 14 17 Robinson GMC 25
## 2 14 18 Accu-Trade 3
## 3 14 23 R Hollenshead Auto Sales 1510
## 4 14 70 Great Northern Auction 28
## 5 14 82 Towne Mazda 4
## 6 14 153 John Wolfe 10109
Week 14 is the start week. Data begins in April, the 14th week of the year.
Visualizing 5000+ different dealers can be tough. What we can do is find out n dealers who have had the largest differences in usage from the beginning to the end of the time period. First we need to re-arrange our data a tad bit.
usage_data <- dealer_grouped |>
arrange(dealer_id, week) |>
group_by(dealer_id)
head(usage_data)
## # A tibble: 6 × 4
## # Groups: dealer_id [2]
## week dealer_id dealer_name weekly_count
## <dbl> <int> <chr> <int>
## 1 31 15 Miller Hughes Ford Lincoln 1
## 2 14 17 Robinson GMC 25
## 3 15 17 Robinson GMC 44
## 4 16 17 Robinson GMC 51
## 5 17 17 Robinson GMC 57
## 6 18 17 Robinson GMC 64
Now we have a good format, where each dealer is in order and the weeks in which they used the tool are in order. Let’s go on to calculate weekly change.
usage_data <- usage_data |>
mutate(weekly_change = weekly_count - lag(weekly_count),
weekly_change = ifelse(is.na(weekly_change), 0, weekly_change))
head(usage_data)
## # A tibble: 6 × 5
## # Groups: dealer_id [2]
## week dealer_id dealer_name weekly_count weekly_change
## <dbl> <int> <chr> <int> <dbl>
## 1 31 15 Miller Hughes Ford Lincoln 1 0
## 2 14 17 Robinson GMC 25 0
## 3 15 17 Robinson GMC 44 19
## 4 16 17 Robinson GMC 51 7
## 5 17 17 Robinson GMC 57 6
## 6 18 17 Robinson GMC 64 7
Now, we know the difference in use between each week and each dealer. We can find the average of this to help us determine which dealers are accelerating/decelerating usage.
usage_data_avg <- usage_data |>
summarise(average_change = mean(weekly_change))
head(usage_data_avg)
## # A tibble: 6 × 2
## dealer_id average_change
## <int> <dbl>
## 1 15 0
## 2 17 0.36
## 3 18 0
## 4 23 47.6
## 5 70 1.32
## 6 82 0.32
Now we can see the top accelerating/decelerating dealers based off their average product usage.
# add back in dealer_name
# create reference frame
ref <- dealer |>
select(dealer_id, dealer_name) |>
distinct()
# merge
merged_df <- merge(usage_data_avg, ref, by = "dealer_id")
merged_df$dealer_name <- strtrim(merged_df$dealer_name, width = 35)
# gather top and bottom 5 (changed to 10)
top_10 <- merged_df |>
arrange(desc(average_change)) |>
head(10)
bottom_10 <- merged_df |>
arrange(average_change) |>
head(10)
head(top_10)
## dealer_id average_change dealer_name
## 1 32233 123.50000 AE of Miami
## 2 32158 122.16667 Elco Chevrolet Cadillac
## 3 153 84.36000 John Wolfe
## 4 10380 83.72000 Cameron Motorsports
## 5 22292 73.36000 Carriage Traders
## 6 32238 66.33333 Crown Motors Ltd. - Buick Cadillac
head(bottom_10)
## dealer_id average_change dealer_name
## 1 955 -37.33333 Demo Dealer Portal
## 2 28998 -34.60000 Germain Honda of BeaverCreek
## 3 30564 -32.50000 Volvo of Lisle
## 4 26562 -30.50000 Test Dealership FL
## 5 29107 -26.08696 Germain Buying Center Dayton
## 6 6341 -24.75000 Bo Beuckman Ford
acceleration_plot <- top_10 |>
ggplot(aes(x = reorder(dealer_name, -average_change),
y = average_change,
fill = average_change)) +
geom_bar(stat = "identity") +
scale_fill_gradient(low = 'blue', high = 'purple') +
labs(x = 'Dealer', y = 'Average Weekly Change',
title = 'Top 10 Accelerating Dealers') +
theme_minimal() +
theme(axis.text.y = element_text(angle = 0, vjust = 0.5, hjust = 1),
axis.text.x = element_text(angle = 40, vjust = 0.5, size = 7))
acceleration_plot
deceleration_plot <- bottom_10 |>
ggplot(aes(x = reorder(dealer_name, -average_change),
y = average_change,
fill = average_change)) +
geom_bar(stat = "identity", position = "identity") +
scale_fill_gradient(low = 'blue', high = 'purple') +
labs(x = 'Dealer', y = 'Average Weekly Change',
title = 'Top 10 Decelerating Dealers') +
theme_minimal() +
theme(axis.text.y = element_text(angle = 0, vjust = 0.5, hjust = 1),
axis.text.x = element_text(angle = 40, vjust = 0.5, size = 7))
deceleration_plot
These two plots portray the top accelerating/decelerating dealers. In Top 10 Accelerating Dealers, we can see that AE of Miami increased their product usage by an average of 123 times each week. In Top 10 Decelerating Dealers, we can see that Demo Dealer Portal decreased their average usage by 37 times each week. This doesn’t seem to be a real dealer so Germain Honda of Beaver Creek was probably the top decelerator with an average decrease of 34.5 less inputs each week. Top decelerators are important to notice because it could lead to a loss of customer.
We can track the usage across all weeks for the top accelerators/decelerators. This will help us gauge which week was the most influential and how long they actually used the product for. These visuals are interactive and have some features like comparing stats if you click the two stacked bars.
Since a line plot will get too confusing, we will narrow down to top 5 accelerators and decelerators.
top_accelerating_dealers <- top_10[1:5, ] |>
select(dealer_id, average_change) |>
merge(usage_data, by = "dealer_id") |>
arrange(dealer_id, week)
acceleration_plot <- plot_ly(top_accelerating_dealers,
x = ~week, y = ~weekly_change,
color = ~factor(dealer_id),
text = ~paste(dealer_name, "<br>Average Change:", round(average_change, 2))) |>
add_lines() |>
layout(
title = "Weekly Change for Top 5 Accelerators",
xaxis = list(title = "Week"),
yaxis = list(title = "Weekly Change"),
showlegend = FALSE,
annotations = list(
text = "Notice: This is an interactive plot. There are<br> some shorter lines at weeks 33 and 35.<br> You can zoom in on these.",
x = 0.85, # Adjust the x-coordinate for caption placement
y = 0.1, # Adjust the y-coordinate for caption placement
xref = "paper",
yref = "paper",
showarrow = FALSE
)
)
acceleration_plot
Some of our top accelerators overall have just began using our tool. Others have been around for the whole time period.
top_decelerating_dealers <- bottom_10[1:5, ] |>
select(dealer_id, average_change) |>
merge(usage_data, by = "dealer_id") |>
arrange(dealer_id, week)
deceleration_plot <- plot_ly(top_decelerating_dealers,
x = ~week, y = ~weekly_change,
color = ~factor(dealer_id),
text = ~paste(dealer_name, "<br>Average Change:", round(average_change, 2))) |>
add_lines() |>
layout(
title = "Weekly Change for Top 5 Decelerators",
xaxis = list(title = "Week"),
yaxis = list(title = "Weekly Change"),
showlegend = FALSE,
annotations = list(
text = "Dealers who fall off the chart early<br>have already cancelled. Germain is our most<br>likely client to cancel due to a decreasing trend of use.",
x = 0.8,
y = 0.1,
xref = "paper",
yref = "paper",
showarrow = FALSE
)
)
deceleration_plot
It is important to track which dealers are accelerating/decelerating so we can catch them before they cancel. This project could be integrated with current data to see which dealers are increasing every week. We can even go further with more data to see why they might be using the tool less each week.